In vqf/ciberAMP: mRNA and copy number data integrative algorithm.

Let's start!

CiberAMP requires several packages that will be automatically installed:

r Biocpkg("TCGAbiolinks"), for TCGA data downloading, filtering, normalization and differential expression analysis (DEA).
r Biocpkg("SummarizedExperiment"), to manage RNAseq experiments data.
r Biocpkg("EDASeq"), to normalize RNAseq counts.
r Biocpkg("edgeR"), to perform differential expression analysis (DEA) between two groups of samples.
r Biocpkg("RTCGAToolbox"), to download latest GISTIC2.0 tun thresholded-by-file data from Broad's Firehouse data server.
r Biocpkg("dplyr") & r Biocpkg("stringr"), to manage strings in R.
r Biocpkg("ggplot2") & r Biocpkg("plotly") & r Biocpkg("shiny"), to plot static (ggplot2) and interactive (plotly) graphs with the results.

Gene query

The algorithm requires two mandatory inputs. The first one is a list of gene symbols to analyze. This can range from a single transcript to the whole set of human genes.

genes.of.interest <- c("GENEA", "GENEB", "GENEC", ...)

In this example, we will study which are genes are differentially expressed in association with their SCNVs in the cohort of head-and-neck and lung squamous cell carcinomas (TCGA-HNSC and TCGA-LUSC, respectively). In particular, we will focus our analysis on genes lodge at chromosomes 3, 5 and 8.

Getting genes from specific chromosomal locations

The first step is to create a vector with the symbols of the genes that are located at chromosomes 3, 5 and 8:

library(biomaRt)
library(stringr)

# First, we get all the genes at chromosomes 3, 5 and 8 using the biomaRt R package. If you have the last version of R, the you can set "TRUE" the "useCache" argument in getBM function.

ensembl <- useEnsembl(biomart = "ensembl", dataset = "hsapiens_gene_ensembl", mirror = "useast")
genes <- getBM(attributes=c('chromosome_name', 'band', 'start_position', 'end_position', 'strand', 'hgnc_symbol'), filters = "chromosome_name", values = c("3", "5", "8"), mart=ensembl, useCache = FALSE)
genes <- genes[genes$hgnc_symbol != "", ]
genes <- genes[!duplicated(genes$hgnc_symbol), ]
for(i in 1:nrow(genes)) {
  if(genes$strand[i] < 0) {
    genes$strand[i] <- "-"
  }else{
    genes$strand[i] <- "+"
  }
}

Running CiberAMP

Now, we have (1) the list of gene symbols to query and (2) the TCGA cohorts to analyze (TCGA-HNSC and TCGA-LUSC). CiberAMP also allows you to designate a path to save your results. By default, it will use the current working directory.

library(ciberAMP)
# x <- ciberAMP(genes = reg.genes$hgnc_symbol, cohorts = c("LUSC", "HNSC"), pat.percentage = 0, *writePath = "PATH_TO_FOLDER"*)

However, there are many arguments that allow you to personalize your analysis:

genes Vector of approved queried genes symbols to be analyzed.
cohorts Vector of TCGA cohorts IDs. See function \code{all_tumors} for a list of available tumors. If it is left empty, CiberAMP will run on every cohort. You can consult the offical IDs at TCGA's web -> https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations
writePath Path where results are stored. Defaults to the current folder. TIP: if you want to re-run CiberAMP, if you use the same folder where all data was stored, you will never need to re-download it again what saves a lot of time and disk space! But, be careful, every run your results will be overwritten!
pat.percentage Minimum percentage of patients per group.
pp.cor.cut Threshold to filter samples by AICC. Passed to TCGAanalyze_Preprocessing.
norm.method Method of normalization, such as gcContent or geneLength (default).
filt.method Method of filtering, such as quantile (default), varFilter, filter1, filter2.
filt.qnt.cut Threshold selected as mean for filtering. Defaults to 0.25.
filt.var.func Filtering function. Defaults to IQR. See genefilter documentation for available methods.
filt.var.cutoff Threshold for filt.var.funct.
filt.eta Parameter for filter1. Defaults to 0.05.
filt.FDR.DEA Threshold to filter differentially expressed genes according their adjusted p-value.
filt.FC Minimum log2(FC) value to considered a gene as differentially expressed. Defaults to 1.
cna.thr Threshold level for copy-number variation analysis. Can be Deep, Shallow or Both
exp.mat Custom normalized RNAseq counts expression matrix of only tumors. Defaults to NULL.
cna.mat Custom copy-number analysis matrix of only tumors. Defaults to NULL.

In this example, we will run the analysis with default parameters:

x <- ciberAMP(genes = as.character(genes$hgnc_symbol), cohorts = c("HNSC", "LUSC"))

Looking into CiberAMP results

CiberAMP results into a list of 3 data frames that can be accessed by:

The x[[1]] data frame contains all differentially expressed genes between (1) tumor vs normal and (2) copy number altered vs diploid tumor samples:

Column 1 -> queried gene approved symbol.
Columns 2:4 -> queried genes tumor vs normal differential expression results.
Column 5 -> TCGA cohort ID.
Column 6:9 -> queried gene SCN-altered vs. diploid tumor samples differential expression results.
Column 10 -> TCGA cohort.
Column 11 -> SCN-altered vs. diploid tumor samples comparison: amplified vs. diploid or deleted vs. diploid.
Column 12 -> % of samples SCN-altered.
Column 14 -> SCN-altered TCGA sample barcodes.

#To get the first queried gene (first row) results.
head(x[[1]][1, ])

The x[[2]] data frame containing CGC list differential expression results. Columns are the same as x[[1]].

head(x[[2]][1, ])

Finally, the x[[3]] data frame contains SCNVs co-occurrency between the SCNV-DEGs and known cancer driver genes amplification and deletions:

Column 1 -> queried gene approved symbol.
Column 2:5 -> queried gene copy number altered vs. diploid tumor samples DE results.
Column 6 -> TCGA cohort ID.
Column 7 -> SCN-altered vs. diploid tumor samples comparison type: amplified vs. diploid or deleted vs. diploid.
Column 8 -> % of tumor SCN-altered samples for a queried gene.
Column 9 -> queried gene SCN-altered TCGA tumor samples' barcodes.
Column 10 -> COSMIC CGC gene approved symbol..
Column 11:14 -> COSMIC CGC gene copy number altered vs. diploid tumor samples DE results.
Column 15 -> TCGA cohort ID.
Column 16 -> SCN-altered vs. diploid tumor samples comparison type: amplified vs diploid or deleted vs diploid.
Column 17 -> % of tumor SCN-altered samples for a COSMIC CGC gene.
Column 18 -> COSMIC CGC gene SCN-altered TCGA tumor samples' barcodes.
Column 19 -> % of overlapping queried and COSMIC CGC genes SCN-altered tumor samples.
Column 20 -> % of overlapping COSMIC CGC and queried genes SCN-altered tumor samples.

head(x[[3]][1, ])

Mapping results into chromosomes

From the original 1870 genes we found that:

114 SCN-altered queried genes show significant SCN-driven differential expression in TCGA-HNSC cohort.
201 SCN-altered queried genes show significant SCN-driven differential expression in TCGA-LUSC cohort.
Only 67 of them were found in both cohorts.

They are mapped at their chromosomal positions:

library(chromPlot)
df <- x[[1]]
genes_cosmic <- x[[3]]
gr.assoc.hnsc <- gr[gr$hgnc_symbol %in% as.character(df[df$Tumor %in% "HNSC" & abs(df$log2FC.SCNAvsDip) >= 1 & df$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ]
gr.assoc.lusc <- gr[gr$hgnc_symbol %in% as.character(df[df$Tumor %in% "LUSC" & abs(df$log2FC.SCNAvsDip) >= 1 & df$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ]
both <- intersect(gr.assoc.hnsc$hgnc_symbol, gr.assoc.lusc$hgnc_symbol)
chromPlot(gaps=hg_gap, annot1 = gr[gr$hgnc_symbol %in% both, ], chr = c(3,5,8))

To answer if these genes are very close to any SCN-altered CGC in the cohort, we visually plot them (red bars) and the CGC co-occurring ones (yellow bars):

library(chromPlot)
df.cosmic <- x[[2]]
gr.assoc.hnsc.cosmic <- gr[gr$hgnc_symbol %in% as.character(df.cosmic[df.cosmic$Tumor %in% "HNSC" & abs(df.cosmic$log2FC.SCNAvsDip) >= 1 & df.cosmic$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ]
gr.assoc.lusc.cosmic <- gr[gr$hgnc_symbol %in% as.character(df.cosmic[df.cosmic$Tumor %in% "LUSC" & abs(df.cosmic$log2FC.SCNAvsDip) >= 1 & df.cosmic$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ]
both.cosmic <- intersect(gr.assoc.hnsc.cosmic$hgnc_symbol, gr.assoc.lusc.cosmic$hgnc_symbol)
chromPlot(gaps=hg_gap, annot1 = gr[gr$hgnc_symbol %in% both, ], annot2 = gr[gr$hgnc_symbol %in% both.cosmic, ], chr = c(3,5,8), chrSide=c(-1,1,1,1,1,1,1,1))

Ggplotting the results

CiberAMP package provides a ggplot-based function to create a scatter plot where:

Log2(FC) mRNA differential expression between tumor vs. normal samples values are plotter in the X axis.
Log2(FC) mRNA differential expression between SCN-altered vs. diploid tumor samples values are plotter in the Y axis.
Amplified genes are plotted as triangles and deleted as circles.
Dot size is directly proportional to the % of samples that gene is SCN-altered at.
Dot color represents each TCGA analyzed cohort.

In the end, a significantly altered gene can be classified in 8 scenarios:

x-value < 0; y-value > 0 | Downregulated in tumor vs. normal samples, with a very high expression in SCN-altered tumor samples.
x-value = 0; y-value > 0 | Upregualted exclusively in SCN-altered tumor samples.
x-value > 0; y-value > 0 | Upregulated in tumor vs. normal samples with an even higher expression in SCN-altered tumor samples.
x-value != 0; y-value == 0 | Deregulated (up or down) in tumor vs. normal samples without any SCN-driven deregulation.
x-value < 0; y-value < 0 | Downregulated in tumor vs. normal samples, even more repressed in SCN-altered tumor samples.
x-value = 0, y-value < 0 | Downregulated exclusively in SCN-altered tumor samples.
x-value > 0; y-value < 0 | Upregualted in tumor vs. normal samples, but repressed in SCN-altered tumor samples.

Let's take a look on those 67 resulting genes:

library(ggplot2)
df.exp <- x[[1]]
df.exp <- df.exp[df.exp$Gene_Symbol %in% both, ]
ggplot.CiberAMP(df.exp)

Interacting with the results

Finally, CiberAMP provides a function to interactively explore your results. In this app, you can filter genes by:

Gene symbol
Minimum % of CN-altered samples
Log2(FC) threshold for mRNA differential expression between tumor vs. normal samples outcomes.
Log2(FC) mRNA differential expression between SCN-altered vs. diploid tumor samples outcomes.
TCGA cohorts IDs.

Clicking on any dot will display a table with the overlapping % of samples in which a SCN-deregulated queried genes co-occurs with any CGC oncodrivers' SCNAs.

Tip: Click on the 'submit' button to initialize the graph!! If 'ALL' is written in the box, then every SCN-altered queried gene from your results will be plotted.

library(shiny)
library(plotly)
library(DT)
int.plot.CiberAMP(df.exp, x[[3]])

Session info {.unnumbered}

sessionInfo()

vqf/ciberAMP documentation built on April 12, 2022, 12:45 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

vqf/ciberAMP
mRNA and copy number data integrative algorithm.

In vqf/ciberAMP: mRNA and copy number data integrative algorithm.

Let's start!

Gene query

Getting genes from specific chromosomal locations

Running CiberAMP

Looking into CiberAMP results

Mapping results into chromosomes

Ggplotting the results

Interacting with the results

Session info {.unnumbered}

R Package Documentation

Browse R Packages

We want your feedback!

vqf/ciberAMP mRNA and copy number data integrative algorithm.

In vqf/ciberAMP: mRNA and copy number data integrative algorithm.

Let's start!

Gene query

Getting genes from specific chromosomal locations

Running CiberAMP

Looking into CiberAMP results

Mapping results into chromosomes

Ggplotting the results

Interacting with the results

Session info {.unnumbered}

R Package Documentation

Browse R Packages

We want your feedback!

vqf/ciberAMP
mRNA and copy number data integrative algorithm.